Checkpointing with Multicast Communication

نویسنده

  • James E. Lumpp
چکیده

For long-running or large-scale distributed programs, the ability to provide software fault-tolerance via checkpointing is valuable. For scalable systems, multicast communication is becoming a predominant communication paradigm. While some aspects of consistency and channel state are the same for both unicast and multicast protocols, the implementation of checkpointing systems differ. This paper explores the problem of checkpointing in a multicast environment and introduces two checkpointing algorithms for such environments. The first algorithm is closely based on existing checkpointing algorithms. The second employs the multicast protocol to distribute checkpointing information efficiently.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Row/Column-First: A Path-based Multicast Algorithm for 2D Mesh-based Network on Chips

In this paper, we propose a new path-based multicast algorithm that is called Row/Column-First algorithm. The proposed algorithm constructs a set of multicast paths to deliver a multicast message to all multicast destination nodes. The set of multicast paths are all of row-first or column-first subcategories to maximize the multicast performance. The selection of row-first or column-first appro...

متن کامل

Multicast computer network routing using genetic algorithm and ant colony

Due to the growth and development of computer networks, the importance of the routing topic has been increased. The importance of the use of multicast networks is not negligible nowadays. Many of multimedia programs need to use a communication link to send a packet from a sender to several receivers. To support such programs, there is a need to make an optimal multicast tree to indicate the opt...

متن کامل

Multicast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach

Wireless Sensor Networks (WSNs) are consist of independent distributed sensors with storing, processing, sensing and communication capabilities to monitor physical or environmental conditions. There are number of challenges in WSNs because of limitation of battery power, communications, computation and storage space. In the recent years, computational intelligence approaches such as evolutionar...

متن کامل

Communication Pattern Based Checkpointing Coordination for Fault-tolerant Distributed Computing Systems

This paper presents a new checkpointing coordination scheme which utilizes the communication pattern of the cooperating processes. In the proposed scheme, the checkpointing is coordinated for the limited number of processes based on the information regarding the communication pattern of the target program. Unlike the previous solutions which do not utilize the communication pattern, it is possi...

متن کامل

Compiler Supported Interval Optimisation for Communication Induced Checkpointing

There exist mainly three different approaches of checkpoint-based recovery mechanisms for distributed systems: coordinated checkpointing, uncoordinated checkpointing and communication induced checkpointing. It can be shown that communication induced checkpointing theoretically has the least minimum overhead, but also that the effective overhead depends on the communication behaviour and the res...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998